<html>
<!-- this is now an html file - properly displaying the information in markdown proved impossible -->
<body>
<pre>
*********************************************************************
                    Kernel structs quick notes
*********************************************************************

Sketch of some of the structures used for Linux OS introspection.
Only relevant fields appear in the sketch. Exact offsets have to be
extracted for each kernel variant using the <a href="utils/kernelinfo">kernelinfo</a> module.

These internal structures may change from kernel to kernel. This means
that kernelinfo and osi_linux have to also be updated. The indicated
versions are approximate. If you have successfully tested a variant
with another kernel version, please share the information through a
pull request.

Notation:
  - Irrelevant fields are omitted.
  - Double braces ({{ }}) are used to expand pointers to structs.
  - When struct member names appear in text, they are enclosed in
    backticks.
  - ppXXX refer to relevant pages in <a href="http://shop.oreilly.com/product/9780596005658.do">Understanding the Linux Kernel</a>
    (3rd Edition).


    ----------------------------------------------------------
            Task Struct and Relatives: Linux 3.2-3.16
    ----------------------------------------------------------
struct task_struct {                           /* <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/sched.h#L1220">sched.h</a> */
    void *stack;                               /* pointer to process task */
    struct mm_struct *mm {{                    /* memory descriptor: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/mm_types.h#L289">mm_types.h</a>, pp353 */
        pgd_t *pgd;                            /* page directory */
        struct vm_area_struct *mmap {{         /* memory regions list: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/mm_types.h#L201">mm_types.h</a> */
            struct mm_struct *vm_mm;           /* address space we belong to (not used for OSI) */
            struct vm_area_struct *vm_next;    /* list of VM areas, sorted by address */
            struct vm_area_struct *vm_prev;    /* list of VM areas, sorted by address */
            unsigned long vm_start;            /* start address within vm_mm */
            unsigned long vm_end;              /* first byte after our end within vm_mm */
            unsigned long vm_flags;            /* RWXS flags */
            struct file *vm_file {{            /* file we map to (can be NULL): <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/fs.h#L964">fs.h</a>, pp471 */
                struct path f_path {{          /* not pointer!: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/path.h#L7">path.h</a> */
                    struct vfsmount *mnt {{    /* mounted fs containing file: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/mount.h#L55">mount.h</a>, pp486 */
                        struct vfsmount *mnt_parent;
                        struct dentry *mnt_mountpoint;    /* dentry of mountpoint: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/dcache.h#L116">dcache.h</a>, pp475 */
                        struct dentry *mnt_root;          /* root of the mounted tree: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/dcache.h#L116">dcache.h</a>, pp475 */
                    }};
                    struct dentry *dentry {{   /* where file is located on fs: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/dcache.h#L116">dcache.h</a>, pp475 */
                        struct qstr d_name {{  /* not pointer!: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/dcache.h#L35">dcache.h</a> */
                            unsigned int len;
                            const unsigned char *name;
                        }};
                        unsigned char d_iname[DNAME_INLINE_LEN]; /* should not be used directly! when the name is small enough, d_name->name will point here. */  
                    }}; /* dentry */
                }}; /* path /*
            }}; /* file */
        }}; /* vm_area_struct */
    }}; /* mm_struct */
    struct files_struct *files {{               /* open files information: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/fdtable.h#XXX">fdtable.h</a>, ppXXX */
        struct fdtable *fdt {{                  /* ??? this may point to fdtab -- VERIFY : <a href="fdtable">fdtable.h</a> */
            struct file **fd {{                 /* current fd array: <a href="XXX">XXX</a> */
            }};
        }};
        struct fdtable fdtab {{                 /* not pointer!: <a href="https://github.com/torvalds/linux/blob/v3.2/include/linux/fdtable.h#XXX">fdtable.h</a> */
            struct file **fd {{                 /* current fd array: <a href="XXX">XXX</a> */
            }};
        }};
    }}; /* files_struct */
} /* task_struct */
<!-- fdt seems to point to fdtab in general. for a few tasks it doesn't.
see

print the tasks where fdt does not point to fdtab:
grep lul f  | grep '0$' | awk '{print $3}' | sort | uniq -c

see if for any of these tasks, fdt points to fdtab at some point:
for t in $(grep lul f  | grep '0$' | awk '{print $3}' | sort | uniq); do grep  "lul.*$t.*1$" f; done
-->

    ----------------------------------------------------------
       Kernel Task List (should be stable across versions)
    ----------------------------------------------------------
The kernel task list can be obtained in two ways:

  1. Using `next_task_struct` which uses the global symbol for
     `init_task` as the starting point.
  2. Using the `current_task_struct` structure. This is slightly
     more complicated, as the starting point (as returned from
     our `get_task_struct()` function) can belong to either a
     process or a thread.
     In either case, it is guaranteed that the next pointer in
     `task_struct` will point to a process (either the next
     process `task_struct` or the `init_task`).

Following, is an illustration of the how the process/thread list
works. There are two fields of interest in each `task_struct`:

  1. The `next` pointer that points to the next `task_struct`.
  2. The `thread_group` field, which is of type:
        `struct list_head { list_head* next, prev; }`
     This means `task_struct.thread_group` will automatically
     give you access to the value of `next`. The tricky detail
     here is that `next` is the address of the `thread_group`
     field of the next `task_struct` in the same thread group.
     See the figure below.

For this example, lets assume that we have two running processes,
with pids 30 and 31. 31 is single threaded and 30 is multi-threaded,
with two additional threads 32 and 33.
Given that we always have `init_task`, we should have a total of 5
`task_structs`: 1 for `init`, 2 for the processes, and 2 for the
threads. The process list for this example would look like this:

,--------------------------------------------------------------------,
|     _____________          _____________          _____________    |
|---&gt; | pid = 0   |    ,---&gt; | pid = 30  |    ,---&gt; | pid = 31  |    |
|     | tgid = 0  |    |     | tgid = 30 |    |     | tgid = 31 |    |
|     | next      | ---'     | next      | ---|     | next      | ---'
| ,-&gt; | t_group   | -,   ,-&gt; | t_group   | -, | ,-&gt; | t_group   | --,
| |   |___________|  |  /    |___________|  | | |   |___________|   |
| '------------------' /                    | | '-------------------'
|                     / ,-------------------' |
|                    |  |    _____________    |
|                    |  |    | pid = 32  |    |
|                    |  |    | tgid = 30 |    |
|                    |  |    | next      | ---' (points to real next)
|                    |  '--&gt; | t_group   | --,
|                    |       |___________|   |
|                    |  ,--------------------'
|                    |  |    _____________
|                    |  |    | pid = 33  |
|                    |  |    | tgid = 30 |
|                    |  |    | next      | ----, (points to init_task)
|                    |  '--&gt; | t_group   | --, |
|                    |       |___________|   | |
|                    '-----------------------' |
'----------------------------------------------'

To sum up:

  1. `thread_group.next` (represented by `t_group`) points to the
      next `thread_group` field.
  2. The `next` field of a `task_struct` is guaranteed to point
     either to:
      * the `task_struct` of the next process in the list.
      * the `task_struct` of the `init_task`.
  3. `pid`s are always unique. This is why in the figure the
     `pid`s of the three threads of the multi-threaded process
     are 30 (the main thread), 32 and 33 (the other two threads).
  4. The `tgid` field is shared between the threads of multi-threaded
     process, and shows the real `pid` of the process.

Note that the above example does not include the `thread_info`
structure. Each `task_struct` is associated with its own
`thread_info` structure which is pointed to by the `stack` field.
The process stack pointer is stored in the `cpu_context` field of
the `thread_info` struct. More info on the `cpu_context` and
`copy_thread()` (called from `copy_process()` called from
`do_fork()`) can be found in <a href="https://github.com/torvalds/linux/blob/v3.2/arch/x86/kernel/process.c">arch/ARCH/kernel/process.c</a>.

</pre>
</body>
</html>
